---
title: Analyze feature associations
dataset_name: N/A
description: How to use a Feature Association matrix to visualize relationships among your features.
domain: platform
expiration_date: 10-10-2024
owner: izzy@datarobot.com
url: docs.datarobot.com/docs/tutorials/prep-learning-data/analyze-feature-associations.html

---

# Analyze feature associations {: #analyze-feature-associations }

In this tutorial, you'll learn how to use a [Feature Association](feature-assoc) matrix to visualize relationships among your numeric and categorical features. You can quickly see the top ten associations and the clusters that are present in your data.

??? tip "How are feature associations calculated?"
    Feature associations are calculated using Mutual Information, by default, but you can switch to Cramer's V. Learn [more about these metrics](feature-assoc#more-about-metrics) in the [Feature Association documentation](feature-assoc).

## Takeaways {: #takeaways }

This tutorial shows how to:

* View the Feature Associations matrix.
* Investigate feature relationships including pairs and clusters of features.

## View the Feature Associations tab {: #view-the-feature-associations-tab }

The **Feature Associations** tab is available after your features are analyzed in EDA2.

The sample dataset featured in this tutorial contains patient data.

![](images/tu-data-dataset.png)

The goal is to predict the likelihood of patient readmission to the hospital. The target feature is `readmitted`.

1. On the **Begin a project** page, upload your data, then specify a target and and click **Start**.

    DataRobot performs EDA2 prior to generating model blueprints.

2. Once DataRobot finishes feature analysis, click **Feature Associations** on the **Data** tab.

    ![](images/tu-feat-assoc-select.png)

    The Feature Associations matrix displays.

    ![](images/tu-feat-assoc-associations.png)

    The features are listed on the x and y axes of the matrix. The association between two features (a *feature pair*) is represented by a colored dot.

3. Investigate clusters of features.

    ![](images/tu-feat-assoc-clusters.png)

    *Feature clusters* are groups of features that are associated to some degree. The dots in a cluster display in the same general color with association strength represented by the depth of the color&mdash;dark (opaque) to light (more transparent). White dots indicate features that are not in a cluster.

    Notice the green, red, and blue clusters identifed in the chart. The red cluster contains the `change`, `diabetesMed`, `insulin`, and `metformin` features. It makes sense that these features are in a cluster because they all relate to diabetes medications&mdash;insulin and metformin are diabetes medications, while the `change` feature indicates that the patient's medication was changed.

4. To zoom in, drag the cursor to outline a section of the matrix.

    ![](images/tu-feat-assoc-zoom.png)

    To view the whole matrix again, click **Reset zoom** below the display.

5. Explore the features by sorting them using the **Sort By** dropdown menu.

    ![](images/tu-feat-assoc-sort-by.png)

    By default, the list is sorted by **Feature Cluster**. You can also sort by name and [Importance](model-ref#importance-score).

6. Use the **Feature List** dropdown menu to view the feature associations based on a different feature list.

    ![](images/tu-feat-assoc-feature-list.png)

## Explore pairs of features {: #explore-pairs-of-features }

1. Select a dot in the matrix to view details about the feature pair.

    ![](images/tu-feat-assoc-feature-pair.png)

    The **Associations** tab on the right shows the cluster that contains the feature pair, as well as the value for the selected metric (Mutual Information, in this case). The tab also provides details about the individual features.

2. Click **View Feature Association Pairs** at the bottom of the **Associations** tab.

    ![](images/tu-feat-assoc-feature-view-pairs.png)

    The window displays a visualization of the association between the two features.

    ![](images/tu-feat-assoc-feature-pair-visualization.png)

    In this case, the features are both categorical so a contingency table shows the frequency distribution of the feature values. For other feature types, different plots display.

3. Select other pairs of features from the **Feature 1** and **Feature 2** dropdown menus.

    For pairs of numeric features, DataRobot generates scatter plots.

    ![](images/tu-feat-assoc-scatter-plot.png)

    If a pair includes a numeric feature and a categorical feature, DataRobot generates a box and whisker plot.

    ![](images/tu-feat-assoc-box-whisker-plot.png)

    In this example, the feature pair of `admission_type_id` (a categorical feature) and `time_in_hospital` (a numeric feature) generates a box and whisker plot. The plot shows the upper and lower quartiles for the data. The endpoints represent the upper and lower extremes.

## Learn more {: #learn-more }

**Documentation:**

* [Feature Association tab](feature-assoc)
* [Importing data into DataRobot](import-data/index)
* [EDA2](eda-explained#eda2)
